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Description 

[0001] The present invention relates to a method and system for signal processing, by which method and system 
features representing distinct sound pictures in auditory signals are extracted from transients in the auditory signals. 
5 The result of the processing nnay be used for identification of sound or speech signals or for quality measurement of 
audio products or systems, such as loudspeakers, hearing aids, telecommunication systems, or for quality measure- 
ment of acoustic conditions. The method of the present invention may also be used in connection with speech com- 
pression and decompression in narrow band telecommunication. 

[0002] In the prior art methods of signal analysing of auditory signals, the signals are considered to be steady state 
10 over a short time of period, and a form of short time spectral analysis is used under this assumption. 

[0003] The human ear has the ability to simultaneously catch fast sound signals, detect sound frequencies with great 
accuracy and differentiate between sound signals in complicated sound environments. For instance it is possible to 
understand what a singer is singing in an accompaniment of musical instruments. 

[0004] in prior art methods of signal analysis and in the method of the present Invention it is assumed that the cochlea 
^5 in the human ear can be regarded as an infinite number of bandpass filters, IBP, within the frequency range of the 
human ear. 

[0005] The time response f (t) for one bandpass filter due to an excitation can be separated into two components, 
the transient response, ft(t), and the steady state response, fs(t), 

20 

(1) f(t)=rft(t)+fs(t). 

[0006] Traditional signal processing is based on the steady state response fs(t), and the transient response ft(t) is 
assumed to vanish very fast and to be without importance for the perception, see for example "Principles of Circuit 

25 Synthesis", McGraw-Hill 1959, Ernest 5. Kuh and Donald O. Pederson, page 12, lines 9-15, where it is stated that: 
"only the forced response is considered while the response due to the initial state of the network is ignored". 
[0007] Thus, when students are introduced to the world of signal analysis, they learn at a very early stage that the 
transient response, i.e. the response due to the initial state of the network should be ignored because it vanishes within 
a very short period of time. Furthermore, it is rather difficult to analyse these transient signals by use of traditional linear 

30 methods of analysis. 

[0008] The ability of the hunnan ear to hear very short sounds and at the same time detect frequencies with great 
accuracy is in conflict with the traditional filterbased spectrum analysis. The time window (twice the rise lime) of a 
bandpass filter is inversely proportional to the bandwidth, 

36 

(2) tw=2/(fu-fl) 
where fl is the lower cutoff frequency and fu is the upper cutoff frequency. 

[0009] Thus, if a rise time of 5 ms is required the consequence is that the frequency resolution is no better than 400 Hz. 

40 [0010] As the detection of these transients is in conflict with a high frequency resolution, the detecting by the human 
ear of these transients must take place in an alternative manner It has not been examined how the human ear is able 
to detect these signals, but it nnight be possible that the cochlea, when no sounds are received, is in a position of rest, 
where the cochlea will be very broad-banded. When a sound signal is received, the cochlea may start to lock itself to 
the frequency component or components within the signal. Thus, the cochlea may be broad-banded in its starting 

45 position, but if one or more stable frequencies are received the cochlea may lock itself to this frequency or these 
frequencies with a high accuracy. 

[0011] Today it is known that the nerve pulses launched from the cochlea are synchronized to the frequency of a 
tone if the frequency is less than about 1 .4 kHz. If the frequency is higher than 1 .4 kHz the pulses are launched randomly 
and less than once per cycle of the frequency 

50 [0012] Signal analysis based on filter bank spectrum analysis is disclosed in GB 221 3623 which describes a system 
for phoneme recognition. This system comprises detecting means for detecting transient parts of a voice signal, where 
the principal object of the transient detection is the detection of a point where the speech spectrum varies most sharply 
namely, a peak point. The detection of the peak points is used for a more precise phoneme segmentation. The transient 
analysis of GB 221 3623 is based on a spectrum analysis and the change in the spectrum, which is very much different 

55 to the transient analysis of the present invention which is based on a direct transient detection in the time domain. 
[0013] The present invention is based on an approach which is different in principle from all known methods for 
analysing auditory signals. According to the invention it has been found that the signal information relevant to the 
identification of the auditory signal is present in the transient component of the signal. Thus, the method of the present 
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invention involves a separation of the transient component or response of the auditory signal, a generation of a transient 
pulse corresponding to the transient component, and analysis of the shape of the pulse. In an auditory signal, the 
corresponding transient pulse nnay be repeated with time intervals, and the time inten/al of these periodic transient 
pulses is normally also analysed or determined. 

5 [0014] In real life the human ear reacts to energy changes at high frequencies in order to recognize phonemes or 
sound pictures. But in the present method transient pulses corresponding to the energy changes observed by the ear 
are extracted at these high frequencies, whereafter the transient pulses preferably are transformed to the low frequency 
range still maintaining the distinct features of the sound pictures or phonemes. Thus, by using the principles of the 
invention, it is possible to obtain distinct features within auditory signals by examining the transformed low frequency 

10 signals. 

[0015] As will be understood from the following explanation of the method of the invention, the concept of extracting 
transient waveforms or shape of pulses makes it possible to use pre-process methods which are much simpler than 
the best designs presently used and at the same time obtain . much more valuable information with respect to the 
auditoryf/input signals. 

15 [0016] "^In its broadest aspect, the invention relates to the use of the shape of energy changes of an auditory signal 
for identifying or representing features which can be perceived by an animal ear such as a human ear as representing 
a distinct sound picture. 

[0017] Before entering into a more detailed explanation of features of the method of the invention, a few definitions 
will be glven:,_ 

20 -t [0018] In short time analysis the transient component in a signal is a matter of definition. The idea is to obtain an 
expression that gives a response corresponding to the response in the cochlea to an abrupt change in the signal 
energy. An abrupt change in the signal energy corresponds to the transient component in the auditory signal. Thus, in 
the present context, the term "transient component" designates any signal corresponding to an abrupt energy change 
in an auditory signal. The transient component holds the signal information to be analysed and in order to analyse this 

25 information the transient component may be transformed to a corresponding transient pulse having a distinct shape. 
Thus, in the present context, the term "transient pulse" refers to a pulse having a distinct shape and substantially 
holding the information of the transient component of the auditory signal and thus corresponding to an abrupt change 
in the energy of the auditory signal. As mentioned above the transient part of a sound signal may be repeated with 
time intervals and thus, in the present context, the term "periodic" when used in combination with a transient component, 

30 response or pulse designates any transient component, response or pulse being repeated with intervals. 

[0019] The term "shape" designates any arbitrary time-varying function (which is time-limited or not time-limited) and 
which, within a given time interval Tp has a distinctly different amplitude level in comparison with the amplitude level 
outside the interval. Thus, Tp is the duration of the shape function when the shape function is time-limited, or the 
duration of the part of the function which has a distinctly different amplitude level in comparison with the amplitude 

36 level outside the time interval. As will be understood, the identification of the shape of a pulse is suitably performed by 
observing the amplitude of the pulse along the time axis of the pulse. 

[0020] I n order to extract information from the shape of the energy changes, one broad aspect of the invention relates 
to represent the shape of the energy changes by the shape of a transient pulse of the signal. However, several methods 
can be applied in order to obtain a transient pulse corresponding to the change in energy, but is is preferred that an 
40 envelope detection is being used, where the envelope preferably should be detected from a transient response of the 
energy change in the auditory signal. 

[0021] The energy change representing the distinct sound picture can be a phoneme or vowel or any other sound 
which gives a sudden energy change in an auditory signal. 

[0022] It is also an aspect of the invention to provide a method for identifying, in an auditory signal, energy changes 
45 which can be perceived by an animal ear such as a human ear as representing a distinct sound picture, the method 
comprising comparing the shape of energy changes of the signal with predetermined energy change shapes repre- 
senting distinct sound pictures. For the identification it is preferred that the shape of the energy changes are represented 
by the shape of a transient pulse of the signal, and it is furthermore preferred that the shape of the transient pulse 
should be obtained by an envelope detection of a transient response of the energy change in the auditory signal. 
BO [0023] The invention also relates to a method for processing an auditory signal so as to reduce the bandwith of the 
signal with substantial retention of the information of the signal, comprising extracting the transient component of the 
auditory signal and detecting an envelope of the transient component. It is preferred that transient pulse shapes of the 
signal which can be perceived by an animal ear such as a human ear as representing a distinct sound picture are 
identified. 

55 [0024] It should be noted that the pulse rise time or the form of the leading edge, the duration of the pulse, and the 
fall time or the form of the lagging edge are all important features for identification of the pulse. In a preferred embod- 
iment of the invention the shape of the leading edge of a pulse is identified, and it is also preferred that the shape of 
the leading edge is determined by determining rise time, slope and/or slope variation of at least part of the leading edge. 
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[0025] In a preferred embodinnent of the invention, the rise time, slope and/or slope variation of at least the top part 
of the leading edge is determined^ since the upper part of the pulse should contain the necessary information. The top 
part may be defined as the part beginning substantially at a point where the slope is maximum. The top part may also 
be the part corresponding to the upper 50% of the amplitude of the pulse. 

[0026] When determining the shape of the pulse several metods may be used, but in a preferred embodiment the 
rise time, slope and/or slope variation of the leading edge is determined on the basis of at least 5 samples. However 
any other suitable number of samples may be used. Another preferred method of identification of the shape of the 
leading edge may be performed using comparison with a library of references. Here, the references with which com- 
parison is made could be selected on the basis of the rise time of the leading edge. 

[0027] It is also preferred to perform an identification of the duration of the pulse, where the duration of a pulse can 
be determined as the distance from the leading edge to the lagging edge at a predetermined amplitude. 
[0028] As should be understood, it is also preferred to identify the shape of the lagging edge of the transient pulse. 
[0029] The method of the present invention provides an expression for the transient conditions of the auditory signal. 
The method comprises a bandpass filtration of an auditory signal within the frequency range of the human ear and a 
detection of a lowpass filtered envelope, which envelope then can be analysed with known methods of signal analysis. 
The envelope is an expression of the transient part of the signal. 

[0030] The known method of signal analysis, which should be used when analysing the envelope, and the charac- 
teristics of the bandpass filter, which should be selected, will depend on the purpose of the analysis. The purpose may 
be speech recognition, quality-measurement of audio products or acoustic conditions, and narrow band telecommu- 
nication. 

[0031] The invention also relates to a system for processing an auditory signal to reduce the bandwith of the signal 
with substantial retention of the inforrtiation of the signal, comprising means for extracting the transient component of 
the auditory signal, and means for detecting an envelope of the transient component. 

[0032] Embodiments and details of the system appear from the claims and the detailed discussion of embodiments 
of the system given in connection with the figures and a mathematical description of an embodiment of the system. 
[0033] The invention will now be described in further detail in connection with a mathematical description of the 
principle of the invention and in connection with the drawing. 

Fig. 1 shows the spectre of a bandpass filter F(co) and a lowpass filter H(co), 

Fig. 2 shows the zeros and the poles in the s-plane for an Infinite number of bandpass filters, IBP, having identical 
bandwidth. 

Fig. 3 shows the zeros and poles in the s-plane for an infinite number of bandpass filters, IBR having identical Q, 
Fig. 4 illustrates the impulse response for various root locations in the s-plane, 
Fig. 5 shows a spectrogrann for the words "linear prediction". 

Fig. 6 illustrates how a sumnnation of an infinite number of bandpass filters, IBR can be performed by one bandpass 
filtration. 

Fig. 7 illustrates the principle of a transient detection system according to the invention, 
Fig. 8 shows a block diagram for a transient detection system according to the invention, 
Fig. 9 shows the characteristics of a preferred highpass filter to be used in the system of Fig. 8, 
Fig. 10 shows the characteristics of a preferred lowpass filter to be used in the system of Fig. 8, 
Fig. 11 illustrates the sensitivity of the human ear, 

Fig. 12 illustrates average formant frequencies for the American vowels /i(:)/, /ae(:)/, /a(:)/, and /u(:)/, 
Fig. 13 shows the experimental results of the first transient analysis of the vowels of Fig. 11 , 
Fig. 14 shows processed curves of the vowel "i" as in "heat", 
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Fig. 15 shows similar curves as in Fig. 12 for the vowel "o" as in "hop", 
Fig. 16 shows normalized time windows for the processed curves of the vowel "i" as in "heat", 
5 Fig. 17 shows normalized time windows for the vowel "o" as in "hop", 

Fig. 18 shows normalized time windows for the vowel "a" as in "have", 

Fig. 1 9 shows a block diagram for a speech recognition system according to the invention, and 

10 

Figs. 20-25 show transient pulses for speech synthesis of the phonemes "i" as in "heat", "o" as in "hop", "o" as in 
• "ongaonga", "u" as in the Danish word "hus", V as In the Danish word "ese", and "y" as in the Danish word "lys", 
respectively. 

15 [0034] First, a mathematical explanation of the principles of the invention is given. 

[0035] A bandpass filter may be represented in the time domain by an impulse response and can be expressed as 

(3) f(t)=h(t)cos(co^t) 

20 .r 

where h(t) is the impulse response for a lowpass filter and to^ is the centre frequency of the bandpass filter f(t). The 
term cos(cOct) may be regarded as representing a frequency shift of the lowpass filter to a bandpass filter with a centre 
frequency at cOc- This is illustrated in Fig. 1 , where F(oi) and H(co) are the corresponding frequency characteristics of f 
(t) and h(t). 

25 [0036] Let the IBP filters be composed of a simple bandpass filter, BR with a zero at origin and two complex poles 
(complementary) In the left half plan of the complex s-plane and let the poles of the IBP filters be placed in a straight 
line then: 

1) If the bandwidth is identical for all the IPB filters then the rise time and the delay time will be identical for ail 
30 filters but Q=fc/(fuTI) will be inversely proportional to the centre frequency fc. The zeros and the poles are shown 

in Fig. 2. 

2) If Q is identical for all filters then the rise time and the delay time will be inversely proportional to the centre 
frequency while the bandwidth will be proportional to the centre frequency. The zeros and the poles are shown in 

35 Fig. 3. 

[0037] It is assumed that the rise time and the delay time are identical for the IBP filters within the frequency range 
which is of interest for the analysis of the transient conditions. If this is not the case it is assumed that the brain will 
compensate for it. The effect is only that the rise time will be slower and the delay time will be longer with falling 

40 frequencies (if Q is identical). The rhythm and the shape of the transients will be the same. 

[0038] In short time analysis the transient component in a signal is a matter of definition. The idea is to get an 
expression that gives a response corresponding to the response in the cochlea to an abrupt change in the signal 
energy. An abrupt change in the signal energy corresponds to the transient component in the auditory signal. 
[0039] The composition of the transient and the steady state component in a signal may be identified by envelope 

45 detection, where the steady state component is the DC component in the detected envelope and the transient com- 
ponent is identified as the changes in the level of the envelope. 
[0040] The transient response may be identified by envelope detection. 
[0041] The envelope of the impulse response can be expressed as 

50 

(4) ft(t)=[f(t)2 + fTtT^ ]^/^ 

where fit) is the Hilbert transform of f(t). 
55 [0042] By substituting (3) into (4) we have 
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10 



20 



(5) ft(t)={ th(t)cos(Wct)]2 + [hTTTcosuJ^t^ 



For the Hilbert transform we have 



(G) u(t)v{t) =u{t)v(t)=u(t)v(t) 



if the spectra for u(t) and v(t) do not overlap. 
[0043] Hence we have 



and 



(7) ft(t)=[[h(t)cos (co^^t)]^ + [h(t)sin(co^t)f 



(8) ft(t) = lh(t)l 



based on the assumption that the spectrum for h(t) does not overlap the centre frequency (o^,. Under this condition the 
envelope of the impulse response is independent of the centre frequency. This is illustrated in Fig. 4 which shows how 
different impulse responses will result in the same envelope. 

[0044] The result of (8) causes the total envelope for the IBP filters to be the sum of the envelopes for the individual 
bandpass filters. 

[0045] An accumulated transient response ftt(t) can thus be expressed by summing ft(t). This summation can be 
expressed as 



30 



(9) 



f tt (t) = 



ft (t,a)c)<i(wc) 



and 



(10) ftt(t)=lh(t)l(o>^^ - 0),,). 

40 

where (o^i is the centre frequency for the lower IBP filter and (o^u Is the centre frequency for the upper IBP filter. 
[0046] Fig. 5 shows a spectrogram for the words "linear prediction" when pronounced by a man. The spectrogram 
is recorded with bandpass filters with a bandwidth of 300 Hz and centre frequencies in the range from about 150 Hz 
up to about 4 kHz. The ordinate is the frequency the abscissa Is the time and the black ink is a degree of the signal 
45 energy. The horizontal oriented black bands are dominating frequency bands in the speech and are called formants. 
The vertical thin lines correspond to abrupt energy changes and thus to the transient components of the signal. A 
spectrogram is usually used for formant analysis and a bandwidth of 300 Hz is not sufficient for transient analysis, but 
the appearance of the shape of the lines confirm that the transient signal is independent of the centre frequency of the 
bandpass filters. 

50 [0047] As mentioned above the cochlea may be regarded as having an infinite number of bandpass filters, but it 
would be advantageous to be able to detect the transient signal without the use of a large number of bandpass filters. 
[0048] Fig. 6 illustrates how a summation of an infinite number of bandpass filters, I BR can be performed by one 
bandpass filtration, BR having a bandwidth that covers the cutoff frequencies of the lower and the upper IBP filter, 
IBP^ and IBP^^. Preferably, the bandpass filter BP should be of the maximum flat delay type, as this type of filter is well 

55 suited for preserving the shape of a transient condition. 

[0049] In practice the simplest way to detect the envelope is to use a rectifier and a lowpass filter, see for example 
"Communication Systems. An introduction to Signal and Noise in Electrical Communication", McGraw-Hill Kogakusha 
1 968, A. Bruce Carlson. From equation (1 0) it can be seen that the accumulated transient component may be detected 
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by performing a highpass filtration, BR covering the range of IBP that needs to be accumulated before the envelope 
detection. An envelope detection corresponds to a frequency shift by the centre frequency co^ of the bandpass filter to 
a lowpass filter with half the bandwidth of the bandpass filter This nneans that the cutoff frequency of the lowpass filter 
determines the bandwidth of all the IBP covered by the BP. This principle Is Illustrated In Fig. 7. 

5 [0050] In Fig. 7 the digltalized sound signal S(t) enters a bandpass or highpass filter BR 10, the output of the bandpass 
filter Is input into a rectifying unit 11 . the output of which is Input into a lowpass filter LP. 12. The output of the lowpass 
filter 12 is designated ftt(t) and represents a detection of the envelope and thus a detection of the transient response 

of the sound signal S(t). 

[0051] From the mathematical definition of a transient part of a signal it can be concluded that the poles of h(t) will 

10 be located on the negative reel axis in the s-plane. This means that the impulse response will not be oscillating around 
zero (a transient response is a non oscillating signal). From equation (10) it can be seen that the limits and for 
the IBP filters is only a question of quantity of ftt(t). 

[0052] The bandpass filtration, BP, sets the limits for the summation of the transient responses of tjne IBP filters, and 
the amplitude characteristic weights the contribution from the IBP filters. If a lowpass filter is used.jnstead of BR there 
15 will be an overlap of the spectrum for h(t) and the centre frequency for the lower IPB filter. The bandpass filter BP 
should have a band width which at least equals the double of the cutoff frequency of the lowpass filter LP. The band 
width and the amplitude characteristic can be utilized for optimizing different signal analyses when using the method 
according to the Invention. 

[0053] In principle the poles of the lowpass filler LP should be located on the negative reel axis for a mathematical 
20 transient detecting system. However, when dealing with auditory signals, it is the characteristic of the cochlea which 
is decisive; but there should preferably be no significant oscillations within the impulse response, as this could make 
the transient conditions of the auditory signal more indistinct. 

[0054] The cutoff frequency of the lowpass filter LP Is an expression for the transient conditions of the signal, and 
this frequency should in connection with auditory signals result in a rise time corresponding to the rise time of the 
25 cochlea. The cutoff frequency may be regarded as an index of transients, where a low cutoff frequency will result in 
transient detection of only those signal elements having a slow rise time, and where a high cutoff frequency also will 
result in detection of signal elements having a fast rise time. 

[0055] The fact that the nerve pulses from the ear are synchronized to the frequency below about 1 .4 kHz and not 
above indicates that the ear is tone oriented below 1 .4 kHz and transient oriented above. In the transient oriented area 

30 the nerve pulses are synchronized to transients, corresponding to abrupt energy changes, in the signal. 

[0056] The cutoff frequencies for the BP should correspond to the transient sensitive range for the cochlea (theoret- 
ically it should have an amplitude characteristic corresponding to the sensitive curve for the ear). The sensitivity curve 
for the human hearing indicates that the lower cutoff frequency must be about 2 kHz and the upper about 5 kHz. The 
amplitude characteristic for the BP filter will weight the contributions from the individual IBP filters. 

35 [0057] From the above discussion a transient detection and analysis system according to the invention may be 
constructed as shown in the block diagram of Fig. 8. In Fig. 8 a sound signal is input into a microphone 13 the output 
of which is passed through a lowpass filter 14 before being digitalized by an A/D converter 15. The output of the A/D 
converter S(t) is lead to a highpass or bandpass filter BR 10, the output of the bandpass filter is input into a rectifying 
unit 11 the output of which is Input into a lowpass filter LR 12, see also Fig. 7. The output of the lowpass filter 12 is 

40 designated ftt(t) and represents the transient components of the Input signal. In order to analyse the transient compo- 
nents, the output signal of the lowpass filter 1 2 should preferably be lead into equipment for signal analysis or recognition 
16. 

[0058] Figs. 9 and 10 show the characteristics of a preferred highpass filter and lowpass filter to be used in the 
systems of Figs. 7 or B. The bandpass filter BP to be used as the highpass filter 1 0 in Figs. 7 or 8 should have a lower 

45 cutoff frequency of at least 2000 Hz, preferably about 3000 Hz. The upper cutoff frequency should be in the range 
between 4500 and 7000 Hz, preferably about 6000 Hz. The characteristic shown in Fig. 9 has a lower cutoff frequency 
of 3014 Hz. The lowpass filter LP to be used in Figs. 7 or 8 should have a higher cutoff frequency in the range of 
400-1200 Hz, preferably about 700 Hz. The characteristic shown in Fig. 10 has a higher cutoff frequency of 732 Hz. It 
would also be possible to construct a transient detection system according to Figs. 7 or 8 by using a full-wave rectifier 

50 However, it is preferred to use a one-way rectifier as illustrated in Figs. 7 and 8. 

[0059] In Fig. 1 1 the sensitivity of the human ear is illustrated as the response of the cochlea on auditory signals for 
tones is shown. As already mentioned the perception is tone oriented up to about 1 .4 kHz and transient oriented above 
1.4 kHz. 

[0060] As mentioned above and illustraled in Fig. 6 the total envelope for the IBP filters is obtained by a summation 
55 of the envelopes of the individual bandpass filters, and the summation of an infinite or high number of bandpass filters 
IBP can be performed by one bandpass filtration BP. This principle is used in the diagram shown in Fig. 7. However, 
a summation of a number of bandpass filters may also be realized by using a filter bank method in which the envelopes 
of a number of individual bandpass filters are detected and summed. Thus, each branch within the filter bank is com- 
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posed of a bandpass filter with a specific centre frequency, a rectifying unit and a lowpass filter, and the outputs of the 
lowpass filters are summed in order to obtain the total envelope. 

[0061] Now, some introductory experiments illustrated by Figs. 12 and 13 will be discussed: 
[0062] Two experiments were carried out in order to evaluate the cutoff frequencies for the BP and the LP fitters and 
5 to evaluate the suitability of the method for speech recognition. 

1. Experiment by listening to an amplitude modulated signal 

[0063] To have a first indication of the cutoff frequency for the LP filter under controlled conditions, a listening exper- 
ts iment was carried out with an amplitude modulated signal in the sensitive frequency range for the ear. The experiment 
is somewhat artificial because nomnally there would not be so intensive a signal in that range and it can not be recom- 
mended to verify the experiment because it is very hard to the ear 

[0064] The carrier frequency was chosen to 3.5 kHz and the modulation tone was tuned up from a few Hz and 
upwards. Until 350 - 400 Hz the envelope signal sounds buzz. After that it sounds first like a hollow /u(:)/ and at 800 
15 Hz like a sharp /i(:)/. Above 800 Hz it was not possible to hear the envelope signal. If the tone is increased further at 
a given point one will hear different mixed tones. 

[006S] The sound was of course dominated by the carrier frequency but it was indicated that the cutoff frequency 
for the LP fitter probably has to be less than 1-1.2 kHz. 

[0066] The modulation index was about 0.75. When it is greater than 1 , the introduction of overtones canbeobsen/ed. 

20 

2. Analysis of transient signals for four vowels 
[0067] Selection of vowels: 

[0068] Fig. 12 shows average formant frequencies for the American vowels /i(:)/, /a&{:)/, /a(:)/, and /u(:)/ as in heed, 
^5 had, hod, and who'd for men, women, and children. These vowels represent a good dispersal among vowels so they 
were selected to the experiment. 

[0069] The vowels were recorded (with Danish accent) pronounced of a man, a woman, and a child by an ordinary 
cassette recorder. 
[0070] Setup for the experiment: 
30 [0071] An analog TSD (Transient Signal Detector) was designed in accordance with Fig. 7. The design was based 
on the operational amplifier LM 833. 
[0072] The specification for the filters were: 

[0073] The BP filter was a four orders Chebyshev filter with 1 db ripple. The upper cutoff frequency is about 6.5 kHz 
and the lower is adjustable from about 550 Hz to 2.6 kHz. 
3S [0074] The rectifier was a full rectifier that converts the negative signal and adds it to the positive signal. 

[0075] The LP fitter was a two orders Butterworth filter designed to have a cutoff frequency at 1 .5 kHz (the 3 db cutoff 
frequency was measured to 2.1 kHz). 

[0076] Recording vowels and detecting the transient signal: 

[0077] Four vowels pronounced by a man, a woman, and a child were recorded on an ordinary radio cassette recorder. 
40 The transient signal was detected by means of the TSD, converted, and stored on PC by means of an 8 bits A/D 
converter The sampling rate when recording was 10 kHz, but when analysing the recorded data only every second 
set of values was considered, resulting in a sampling rate of 5 kHz. An 8 bits A/D converter gives a poor dynamic range 
and therefore it was necessary to record the vowets isolated (that means not in a word) and this gives a more uncertain 
pronunciation. 

45 [0078] Figs. 13a-13p show the experirtiental results of the first transient analysis of the vowels of Fig. 12. 

[0079] It is possible to identify the vowel by listening to the transient signal. By visual inspection of time variation of 
the results it could be observed that the same vowel pronounced by a man, a woman, and a child, respectively was 
having almost the same characteristics, even if differences in the fundamental tone were observed. When recording 
the vowel /a(:)/ as in the Danish word "op", a p-sound was also recorded which is clearly seen from the time variation 

50 of the transient signal. 

[0080] Analysis of the transient signals: 

[0081] The power in the transient signals varies a lot from vowel to vowel. The signals of the vowels /a{:)/ and /u(:) 
/ were very low (especially for the man's voice) and it was necessary to turn up the volume for the radio cassette 
recorder to a high level and it caused a lot of noise. 
55 [0082] First, there were made a number of FFT analysis of 20 ms duration and a 5 Khz sampling rate at different 
starting points in the vowels. The spectra appear to be very outstanding and identical throughout the vowel. This 
strongly indicates that there is important information in the signal. 

[0083] In order to analyse common features 20 ms (101 samples) were randomly chosen from each vowel. The time 
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signals were smoothed by a Hamming window and the FFT'S were calculated. In Figs. 1 3a-1 3d the power spectra are 
shown where the three voices are illustrated in the same diagram for each vowel and the corresponding transient 
signals are shown separately in Figs. 13e-13h when pronounced by a woman, in Figs. 131-131 when pronounced by 
a man and In Figs. 13m-13p when pronounced by a child. 
5 [0084] The spectra are expected to have the following features: 

[0085] The spectra of the same vowel pronounced by three different voices will have some common features related 
to the vowel and some features related to the voice. 

[0086] The spectra of different vowels pronounced by the same voice will have some features related to the different 
vowels and some common features from the voices. 
10 [0087] Furthermore, it must be expected that the shape of the spectra plays a more important part than the absolute 
frequencies. 

[0088] From the power spectra the following can be seen: 
/i(:)/(Fig. 13a) 

[0089] The most remarkable feature is that the spectra from all three have an outstanding top In the frequency-range 
15 from 300-400 Hz, they are 50 Hz wide, and there are an outstanding cleft at 200-250 Hz. Furthermore, there is a 
contribution at 50 Hz. The man's voice has a contribution at 150 Hz which must attribute to a deep voice. 
/ae(:)/ (Fig. 13b) 

[0090] The voices of the woman and of the man have an outstanding cleft at 350 Hz (deeper than 50 db). The mans 
voice has also in this case a contribution at 150 Hz. The voice of the child does not fit so well into the pattern, this 
20 might perhaps be due to an uncertain pronunciation. 
/a(:)/ (Fig. 13c) 

[0091] All three voices have top 250-300 Hz. The frequency range is a bit lower and not so outstanding as for the /i 
(:)/. Further, there is major contribution at 50 Hz and below for all three voices. 
/u(:)/(Fig. 13d) 

25 [0092] The voices of the child and of the woman are real alike and they have a peak at 300 and 350 Hz and they 
have a deep wide valley at 100 Hz. The man's voice has also a peak and the valley is as wide as it is for the woman 
and the child but not so deep. The reason for this can be the deep voice and the fact that there is a lot of noise in the 
signal caused by the radio cassette recorder. 

[0093] The experiments leading to the results of Figs. 13a-p can be seen as introductory but the results are highly 
30 interesting especially when taking into consideration the simple equipment that has been used with a lot of noise and 
only 8 bit A/D-converter. In spite of this the results are outstanding. There has been no particular data selection to 
improve the results and there is therefore no doubt that the transient condition is of decisive importance for speech 
recognition. 

[0094] It seems like all information might be located in the frequency range below 500 Hz. If this is the case then the 
35 demand on the sampling frequency will be less than 1 .5 kHz and it will be possible to analyse the speech signal very 
intensively with more parallel processes. It is possible to have more time windows for instance 5, 20, and 40 ms and 
use spectrum analysis (FFT, LPC, CEPSTRUM, or others) to detect some phonemes and time analysis (correlation or 
methods) to detect others phonemes. 

[0095] It is most likely that a more sophisticated design of the TSD with an AGC amplifier as preamplifier and a 
40 logarithmic or AGC amplifier after the BP filter in order to compensate for variations in the energy of the bandpass 
filtered phonemes, will allow very good results to be obtained and cause a very robust speaker independent speech 
recognition. Better results may be obtained if a 12 or 16 bit A/D converter is used instead of the 8 bit A/D converter. 
[0096] Further experimental results illustrated in Figs. 14-18 will be discussed in the following: 
[0097] The method of extracting transient signal components according to the present invention may also be regarded 
45 as a pre-process of the auditory input signal. In order to be able to obtain a better understanding and/or determination 
of the parameters of the pre-process a software programme were developed, by use of which it is possible to show 
the output signals and listen to the outcome after each process step of the pre-process. 

[0098] The analysis of speech signals shown in Figs. 1 4 and 1 5 has been made by means of this software programme 
running on a Compaq Deskpro4/66i PC. This type of PC is provided with Microsoft Windows Sound System, a micro- 
50 phone and a codec chip (AD1848) from Analog Devices. The codec chip performs the sampling, the anti aliasing 
filtration and the A/D conversion. 

[0099] The speech signals shown in Figs. 14a and 15a are recorded by means of this Sound System. The speech 
signal is sampled with 11025 kHz and 16 bits linear PGM. The passband is greater than 4.9 kHz. 
[0100] Pretransient signals are shown in Figs. 1 4b and 15b. These signals are the speech signals filtered by a third 
55 order MR digital highpass filter with a cutoff frequency at 3.0 kHz. The filter is a bilinear transformation of a third order 
Butte rworth filter. 

[0101] The cutoff frequency at 3.0 kHz has been chosen to get the bandpass in the range of the most sensitive area 
of the cochlea. In this case it means from 3.0 kHz to 4.9 kHz, where 4.9 kHz is given by the codec chip. The high- or 
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bandpass filter will be optimal If it has maximum flat delay characteristic in accordance with equation (10). 
[0102] The transient signals shown in Figs. 14c and 15c are the pretransient signal rectified and filtered by a second 
order IIR digital lowpass fitter with a cutoff frequency at about 700 Hz. The filter is a bilinear transformation of a second 
order Butte rworth filter. 

[0103] The lowpass filter shall preserve the shape of the transient pulse corresponding to a transient response in 
the cochlea, so that a filter which is able to do this will be an optimal filter. The nerves in the cochlea are able to launch 
nerve pulses with a frequency up to about 1 .4 kHz. A bandwidth for the IBP filters In the transient oriented area at 1 .4 
kHz are transformed by the envelope detection to a cutoff frequency for a lowpass filter at 700 Hz, which is the reason 
why a cutoff frequency at about 700 Hz has been chosen. 

[0104] The transient signal may be regarded as an expression for the energy change in the signal. 
[01 OS] All the signals presented in Figs. 14 and 15 are normalized to a maximum signal level, which means that the 
largest absolute signal value is equal to 32766. The abscissas in Figs. 14 and 15 represent a time interval of 50 ms 
and the ordinates in Figs. 14a, 15a and Figs. 14b, 15b represent the sound pressure of the corresponding speech 
signal whereas the ordinates of Figs. 14c, 15c represent the energy of the corresponding transient speech signal. 
[0106] If is possible to listen to the speech, the pretransient and the transient signals, corresponding to Figs. 14a, 
15a, 14b, 15b and 14c, 15c, respectively. One of the main demands for selecting the filter characteristics is that the 
signals have to maintain a sound which is close to the original speech signal when listening to the above mentioned 
signals. 

[01 07] Referring to the system illustrated in Fig. 7, Fig. 1 4 shows curves of the vowel "i" as in "heat", when pronounced 
by a man, where (a) shows the speech signal before filtration corresponding to the digitalized input signal S(t) in Fig. 
7, (b) shows the signal after a highpass filtration corresponding to the output signal of the bandpass filter 1 0 in Fig. 7, 
and (c) shows the signal after rectifying and lowpass filtering corresponding to the output signal of the lowpass filter 
12 in Fig. 7. 

[0108] Fig. 15 shows similar curves as in Fig. 1 4 for the vowel "o" as in "hop", 

[01 09] The rise and fall time and the width or duration of the transient pulse is observed to be of importance for the 
sound in a vowel. Figs. 16-18 give examples of measured transient pulses. The time window of the vowel "I" as in 
"heat", when pronounced by a man, shown in Fig. 16a corresponds to the processed signal shown in Fig. 14c. The 
corresponding time window when the vowel "i" as in "heat" is pronounced by a child is shown in Fig. 16b. From Figs. 
16a and 16b it can be observed that the leading and lagging edges of the most dominant pulses are sharp with a rise 
and fall time in about 0.4 ms or less and that the width of the dominant pulses is about 0 8 ms when measured at the 
level of about 50 %. 

[0110] The time window of the vowel "o" as in "hiop", when pronounced by a man, shown in Fig. 17a corresponds to 
the processed signal shown in Fig. 15c. The corresponding time window when the vowel "o" as in "hop" is pronounced 
by a child is shown in Fig. 17b. From Figs. 17a and 17b it can be observed that the leading and lagging edges of the 
most dominant pulses are sharp with a rise and fall time in about 0.5 ms but the width of the dominant pulses is about 
1.5 ms when measured at the level about 50 %. The ditch in the dominant pulses of Fig. 17b Is not deep enough to 
influence the perception. It should be noted that the vowel "o" as in "hop" is a sharp vowel, and a more soft vowel will 
have a more slow lagging edge. 

[0111] Fig. 18 shows the time window for the processed signal of the vowel "a" as in "have" when pronounced by a 
man. It is to be observed that the shape of the transient pulse has softer leading and lagging edges than the pulses 
shown in Figs. 16-17. 

[0112] Thus, from the above results it may be concluded that the perception of a vowel is given by the shape of the 
transient pulse. It is further to be concluded that by analysing the transient components or pulses which have been 
extracted from the auditory signal by way of the above mentioned method of signal processing, the vowels or phonemes 
of the speech signal may be recognised by identifying the shape of the transient pulse or pulses. 
[0113] In a vowel or phoneme the transient pulse is repeated and the repetition frequency gives the perception of 
the pitch. In Fig. 16a the time period between two succeeding pulses is about 6 ms corresponding to a man's pitch at 
1 70 Hz and in Fig. 1 6b the time period between two succeeding pulses Is about 3.5 ms corresponding to a child's pitch 
at 280 Hz 

[01 1 4] Thus, it is also to be concluded that by analysing the transient component or pulses which have been extracted 
from the auditory signal by way of the above mentioned method of signal processing, the pitch of the speech signal 
may be determined by determining the time period between the transient pulses. 

[0115] Thus, when analysing auditory signals according to a preferred embodiment of the present invention, it Is 
taken into account that the identity of the sound signal is preserved during the signal processing which includes a 
highpass filtration followed by a rectification and a lowpass filtration of the input signal. 

[0116] From the above discussion it should be understood that the present invention provides a method which is 
very suitable for use in speech recognition. 

[0117] Fig. 19 shows a block diagram for a speech recognition system according to the invention. In this system a 
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pre-process unit 20 is provided which comprises the bandpass filter 10, rectifying circuit 11 and lowpass filter 12 of 
Fig. 7. Thus, the pre-process unit, which nnost conveniently may be integrated within a single integrated circuit or chip, 
is a transient detecting unit in accordance with the method of the present inventiorn. The system further comprises units 
which are normally used in speech recognition systems, such as a pattern recognition unit 21 connected to a reference 

5 library 22, a unit for phoneme determination 23 and a unit for word/sentence determination 24. The system shown in 
Fig. 19 uses template matching but alternative approaches may be used in a recognition system. 
[0118] The reference library 22 of Fig. 1 9 should store a library corresponding to the shapes which can be generated 
by the pre-process unit 20. , - 

[0119] It should be understood that a single chip pre-process unit also may comprise the lowpass filter 1 4 and or the 

10 A/D converter 15 as shown in Fig. 8. 

[0120] It is to be understood that a pre-process according to the present invention could be used in many other 
electronic systems where speech or sound analysis, recognition, coding and/or decoding is required, such as quality 
measurement of audio products or systems, such as loudspeakers, hearing aids, and telecommunication systems, or 

cv-_:^tGr quality measurement of acoustic conditions. The pre-process may also be used in connection with speech comr- - 

15 pression and decompression in narrow band telecommunication. 

[01 21] As illustrated in Fig. 1 0 the preferred cutoff frequency of the lowpass filter 1 2 used in a pre-process unit should 
be below 1 kHz. Thus, all the necessary signal information of the auditory signals is represented within a rather narrow 
frequency range of 1 kHz. This should be compared to the frequency band of around 9000 bits per second which is 
.used, within the GSM mobile telecommunication system for the communication of speech signals. By using the pre- 

20 process method or unit of the present invention it should be possible to decrease the frequency band used for tele- 
communication down to about 1000 bits per second which would result in great savings within this area of communi- 
cation. 

[0122] Thus, it should be understood that the present method is very well suited for optimizing the bandwidth within 
narrow band telecommunicaton and it is within the scope of the invention that when transmitting an auditory signal in 
25 a telecommunication system, the signal should be processed by using the pre-process described herein before being 
transmitted and received by a receiver. It It preferred that prior to transmission of the processed signal, the signal is 
coded into a digital representation, and the coded signal is decoded in the receiver so as to reestablish transient pulse 
shapes perceived by the animal ear such as the human ear as representing the distinct sound pictures of the auditory 
signal. 

30 [01 23] During the above mentioned digital transmission the bandwidth may be chosen so as to fulfil different require- 
ments to the quality of the received, decoded and reestablished transient pulse. Thus, a bandwidth of at the most 4000 
bits per second may be selected, but it should be possible to obtain a good quality of the reestablished pulse by using 
a bandwidth around 2000 bits per second However, It is preferred that the bandwith is in the inten/al of 800-2000 bits 
per second. It is to be noted that for telecommunicating systems where a high system performance is preferred as 

35 opposed to a high quality of the reestablished signal, such as for example in nnilitary systems, a bandwidth about 400 
bits per second may be selected. 

[0124] When transmitting the digital signals it is preferred that the digital information comprises Information about 
leading edge, lagging edge, and duration of the transient pulse representing the processed auditory signal. It is also 
preferred that a second and further pulses in a sequence of identical pulses are represented by a digital sign indicating 

40 repetition when transmitted. 

[0125] It Is also an object of the present invention to provide a method to be used in speech synthesis. 
[01 26] From the discussion of the experimental results of Figs. 1 4-1 8 it should be understood that the sound of each 
vowel or phoneme might be given by the shape of a dominating transient pulse corresponding specifically to that 
phoneme. From experiments it has been concluded that transient pulses similar to the processed pulses of Figs. 1 6-1 8 

45 hold the necessary information in order to generate the sound of the phoneme. 

[0127] By use of the software developed for the transient analysis illustrated in Figs. 14-18 it is possible to create a 
simple transient signal by placing points in a system of coordinates where the ordinate is the amplitude and the abscissa 
is the time in ms. One transient pulse may be created by placing one or several points and interpolate a line between 
the points either by a straight line or a sine curve and define a period. The signal is repeated for 300 ms and It Is 

50 possible to listen to the signal when converted by a D/A converter in the codec chip. 

[0128] It should be noted that the pulse rise time or the form of the leading edge, the duration of the pulse, and the 
fall time or the form of the lagging edge are all important features for identification, representation and/or generation 
of transient pulses for use in speech recognition and/or synthesis. These features may also be used In connection with 
speech compression. 

55 [0129] This is illustrated In Figs. 20-25 which show how transient pulses used for speech synthesis or identification 
should be formed for the phonemes "i" as in "heat", "o" as in "hop", "o" as in "ongaonga" or as in the Danish word "Ole", 
"u" as In the word "who", "o" as in the Danish word "ose", and "y" as in the Danish word "lys", respectively The pulses 
are repeated within a period of 5 ms. 
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[01 30] From Fig. 20 it can be seen that the phoneme "i" as in "heat" could be formed by a very short pulse having a 
duration in the range of 0.3-1 .1 ms, with a rise tinne of the leading edge being in the range of 0.3-0.5 ms. The fall time 
of the lagging edge should also be in the range of 0.3-0.5 ms. 

[0131] Similarly it is observed from Fig. 21 that the phoneme "o" as in "hop" could be formed by a pulse having a 
duration in the range of 1 .3-1 .8 ms, with a rise time of the leading edge being in the range of 0.3-0.5 ms. The fall time 
of the lagging edge should be In the range of 0.3-0.5 ms. 

[0132] From Fig. 22 it is observed that the phoneme "o" as in the Danish word "Ole" could be formed by a pulse 
having a duration in the range of 1 .3-1 .8 ms in the upper part of the pulse, with a rise time of the leading edge being 
in the range of 0.3-0.5 ms. The fall time of the lagging edge for this phoneme may vary, but should be in the range of 
1.0-2.0 ms. 

[0133] From Fig. 23 it is observed that the phoneme "u" as in the word "who" could be formed by generating a 
transient pulse with a sine curve interpolation and a duration in the range of 1 .0-2.0 ms. The preferred duration should 
be about 1 .5 ms. 

[0134] Fig. 24 show the pulse of the phoneme "o" as in the Danish word "ose". Here the leading edge may have a 
rise time in the range of 0.4-0.6 ms. The fall time of the lagging edge should be in the range 1.0-2.0 ms. 
[0135] Fig. 25 show the pulse of the phoneme "y" as in the Danish word "lys". Here the leading edge may have a 
rise time in the range of 1 .0-2,0 ms. The fall time of the lagging edge should also be in the range 1 .0-2.0 ms. 
[0136] When synthesizing human speech in accordance with the above mentioned principles of the invention It is 
preferred to generate a series of transient pulses corresponding to the series of phonemes which constitutes the speech 
to be synthesized. It is furthermore preferred that the series of phonemes is established from a series of letters using 
rule-based conversion. 

[01 37] It should be understood that the principles of the invention also can be used for quality measurement of audio 
products. In such a measurement a well defined transient signal should be transmitted to the audio product, and the 
distorsion of the response can be measured. The distorsion may be measured by using a pre-process in accordance 
with the principles illustrated in Fig. 7. 

[0138] The principles of the invention may also be used in hearing aids in order to improve noise suppresion in 
speech signals. 

[01 39] A library of features representing characteristic shapes of the transient pulses may be used for identifying the 
speech signal and separate the speech signal from the noise background. 

[0140] The experiments presented have, for the first time, shown some common features for phonemes which are 
Very simple to recognize and generate, but which could be of great significance within the whole area of recognition 
and generation of speech or auditory signals. 

[0141] The performance of the method and system of the present invention is described in the time domaine. It is 
however to be understood that the transient signals, components and/or pulses being described in the time domaine 
also could be given a corresponding description in the frequency domaine, which would naturally be within the scope 
of the invention. 

[01 42] It is also to be noted that the methods of signal processing described above could be performed either digitally, 
electronically by use of analog components, mechanically, or by any combination thereof. Such methods of processing 
would also be within the scope of the invention. 



Claims 

1 . The use of the shape of abrupt energy changes of an auditory signal for identifying or representing features which 
can be perceived by an animal ear such as a human ear as representing a distinct sound picture, said abrupt 
energy changes of the auditory signal being represented by a transient pulse of a transient signal derived from 
the auditory signal, having a rise time of at the most 2 ms, and said shape of abrupt energy changes being repre- 
sented by the shape of the transient pulse. 

2. The use according to claim 1 , wherein the shape of a transient pulse is obtained by use of an envelope detection. 

3. The use according to any of the preceding claims, wherein the distinct sound picture is a phoneme. 

4. A method for identifying, in an auditory signal, abrupt energy changes which can be perceived by an animal ear 
such as a human ear as representing a distinct sound picture, the method comprising 

deriving, from the auditory signal, a transient signal comprising transient pulses representing abrupt energy 
changes in the auditory signal having a rise time of at the most 2 ms, 
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selecting a dominant pulse of these transient pulses in the signal, and 

comparing the shape of the dominant pulse with predetermined transient signal pulses representing distinct 
sound pictures. 

5. A method according to claim 4, wherein the shape of a transient pulse is obtained by an envelope detection of a 
transient response of the energy change in the auditory signal. 

6. _, A--method for processing an auditory signal to reduce the bandwidth of the signal with substantial. retention of the 

information of the signal, comprising extracting the transient component corresponding to an abrupt energy change 
of the auditory signal followed by detecting an envelope of the transient component, said envelope detection being 
carried out in such a manner so as to obtain from the extracted transient component a transient signal comprising 
transient pulses with a shape representing abrupt energy changes having a rise time of at the most 2 ms. 

7. A=metho*d according to claim 6, wherein transient pulse shapes of the signal which can be perceived by an animal 
ear such as a human ear as representing a distinct sound picture are identified. 

8. A method according to claim 7, wherein the distinct sound picture is a phoneme. 

9. A method according to claim 4 or 7, wherein the shape of the leading edge of a pulse is identified. 

10. A method according to claim 9, wherein the shape of the leading edge is determined by determining rise time, 
slope and/or slope variation of at least part of the leading edge. 

11. A method according to claim 10, wherein the rise time, slope and/or slope variation of at least the top part of the 
leading edge is determined. 

12. A method according to claim 11 , wherein the top part is the part beginning substantially at a point where the slope 
is maximum. 

13. A method according to claim 10, wherein the rise time, slope and/or slope variation of the leading edge is determined 
on the basis of at least 5 samples. 

1 4. A method according to any of claims 9-1 3, wherein the identification of the shape of the leading edge is performed 
using comparison with a library of references. 

15. A method according to claim 14, wherein the references with which comparison is made are selected on the basis 
of the rise time of the leading edge. 

16. A method according to claim 4 or 7, wherein the duration of a pulse is identified. 

17. A method according to claim 16, wherein the duration of a pulse is determined as the distance from the leading 
edge to the lagging edge at a predetermined amplitude. 

18. A method according to claim 17, wherein the predetermined amplitude is an amplitude of at the most 50% of the 
maximum amplitude of the pulse. 

19. A method according to any of claims 9-18 wherein pulses which cannot be perceived by the animal ear are dis- 
carded from the identification. 

20. A method according to claim 19, wherein a pulse the leading edge of which has an annplitude of less than 50% of 
the amplitude of the amplitude of the preceding pulse and an onset time of less than 3.5 ms is disregarded. 

21 . A method according to any of claims 9-20, wherein the shape of the tagging edge of a pulse is identified. 

22. A method according to claim 21, wherein the shape of the lagging edge is determined by determining fall time, 
slope and/or slope variation of at least part of the leading edge. 

23. A method according to any of claims 9-21, wherein the time period between leading edges of pulses which can 
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be perceived by the animal ear is determined. 

24. A method according to claim 23, wherein the time period between leading edges which have a distance of at least 
3 ms from each other is determined. 

25. A method for telecommunicating an auditory signal, comprising processing the signal by the method according to 
any of claims 6-24, transmitting the processed signal, and receiving the processed signal by a receiver 

26. A method according to claim 25, wherein, prior to transmission of the processed signal, the signal is coded into a 
digital representation, and the coded signal is decoded in the receiver so as to reestablish transient pulse shapes 
perceived by the animal ear such as the human ear as representing the distinct sound pictures of the auditory signal. 

27. A method according to claim 26, wherein the digital transmission is performed at a bandwidth of at the most 4000 
bits per second. > - 

28. A method according to claim 27, wherein the bandwidth is at the most 2000 bits per second. 

29. A method according to claim 28, wherein the bandwidth is in the interval of 800-2000 bits per second. 

30. A method, according to any of claims 26-29, wherein the digital information comprises information about leading 
edge, lagging edge, and duration of the transient pulse. 

31. A method according to any of claims 26-30, wherein a the second and further pulses in a sequence of identical 
pulses are represented by a digital sign indicating repetition. 

32. A method according to any of the claims 6-24, wherein the extraction of transient component comprises a bandpass 
filtration or a highpass filtration. 

33. A method according to any of the claims 6-24 or 32, wherein the envelope detection comprises a rectification and 
a lowpass filtration. 

34. A method according to claim 32, wherein the lower cutoff frequency of the bandpass or highpass filtration is at 
least 2 kHz, such as about 3 kHz. 

35. A method according to claim 32 or 34, wherein the upper cutoff frequency is in the range between 4.5 and 7 kHz, 
preferably about 6 kHz. 

36. A method according to claim 33, wherein the rectification is a one-way rectification. 

37. A method according to claims 33 or 36, wherein the cutoff frequency of the lowpass filtration Is in the range of 
400-1000 Hz, preferably about 700 Hz. 

38. A method according to any of the claims 6-24 or 32, wherein the envelope detection comprises bandpass filtration 
by use of a bank of filters. 

39. A method of identifying or representing the phoneme "i" as in "heat", comprising identifying or generating a transient 
pulse with a rise time of the leading edge of at the most 0.5 ms and a duration of at the most 1.1 ms. 

40. A method according to claim 39, wherein the rise time of the leading edge is at the most 0.4 ms, preferably at the 
most 0.3 ms. 

41. A method according to claim 39 or 40, wherein the duration is at the most 1.0 ms, preferably about 0.8 ms. 

42. A method of identifying or representing the phoneme "o" as in "hop", comprising identifying or generating a transient 
pulse with a rise time of the leading edge of at the most 0.5 ms and a duration of 1.3-1 .8 ms. 

43. A method according to claim 42, wherein the rise time of the leading edge is at the most 0.4 ms, preferably at the 
most 0.3 ms. 
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44. A method according to claim 39 or 40, wherein the fall time of the lagging edge is at the most 0.5 ms, preferably 
at the most 0.4 ms and more preferably at the most 0.3 ms. 

45. A method of identifying or representing the phoneme "o" as in the English word "ongaonga" or the Danish word 
5 "Ole", comprising identifying or generating a transient pulse with a rise time of the leading edge of at the nnost 0.5 

ms and a duration of 1 .3-1 .8 ms. 

46. A method.pr.identifying or representing the phoneme "u" as in the English word "who", comprising identifying or 
generating a transient pulse with a sine curve interpolation and a duration of at 1 .0-2.0 ms, preferably about 1 .5 ms. 

10 

47. A method according to any of the claims 1 -24 or 39-46, when used in speech recognition. 

48. A method according to any of the claims 1-5 or 39-46, used in speech compression. 

75 49. A method according to any of the claims 1-5 or 39-46, when used for synthesizing human speech, connprising 
generating a series of transient pulses corresponding to the series of phonemes which constitutes the speech to 
be synthesized. 

50. A method according jo claim 49, wherein the series of phonemes is established fronn a series of letters using rule- 
20 based conversion. 

51. A method according to any of the claims 1-5 or 39-46, used in quality-measurement of audio products, the audio 
products preferably being loudspeakers, hearing aids or telecommunication systems. 

25 52. A method according to any of the claims 1-5 or 39-46, used in quality-measurement of acoustic conditions in a 
room or in the open. 

53. A system for processing an auditory signal to reduce the bandwith of the signal with substantial retention of the 
information of the signal, compnsing means for extracting the transient component corresponding to an abrupt 

30 energy change of the auditory signal, and means for detecting an envelope of the transient component, said en- 

velope detection means being adapted to derive from the extracted transient component a transient signal com- 
prising transient pulses with a shape representing abrupt energy changes having a rise time of at the nnost 2 ms. 

54. A system according to claim 53, further comprising means for identifying or representing the energy changes on 
35 the basis of the shape of the transient pulses. 

55. A system according to claims 53 or 54, wherein the means for transient component extraction comprises a band- 
pass filter or a highpass filter. 

56. A system according to any of the claims 53-55, wherein the envelope detection means comprises a rectifier and 
a lowpass filter 

57. A system according to claim 55 or 56, wherein the lower cutoff frequency of the bandpass or highpass filter is at 
least 2 kHz, such as about 3 kHz. 

45 

58. A system according to any of the claims 55-57, wherein the upper cutoff frequency of the bandpass filter is in the 
range between 4.5 and 7 kHz, preferably about 6 kHz. 

59. A system according to any of the claims 56-58, wherein the rectifier is a one-way rectifier 

50 

60. A system according to any of claims 56-59, wherein the cutoff frequency of the lowpass filter is in the range of 
400-1000 Hz, preferably about 700 Hz. 

61 . A system according to claim 53 or 54, wherein the envelope detection means comprises a filter bank. 

55 
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Patentanspruche 

I . Verwendung der Form abrupter Energieanderungen eines akustischen Signals zum Identifiziereii Oder Darstellen 
von Merknnalen, welche von einem tierischen Ohr wie z. B. einem menschlichen Ohr wahrnehmbar sind, als Dar- 

5 stellung eines unterscheidbaren Tonbildes, wobei die abrupten Energieanderungen des akustischen Signals durch 

einen Transienten-lmpuls eines Transienten-Signales dargestellt werden, welches aus dem akustischen Signal 
abgeleitet ist, mit einer Anstiegszeit von hochstens 2 ms, und wobei die Form der abrupten Energieanderungen 
durch die Form des Transienten-lmpulses dargestellt wird. 

to 2. Verwendung nach Anspruch 1 , 

bei welcher die Form eines Transienten-lmpulses durch die Verwendung einer Hullkurven-Erfassung erhalten wird. 

3. Verwendung nach einem der vorstehenden Anspruche, 
bei welcher das unterscheidbare Tonbild ein Phonem Ist. 

^s - ' - 

4. Verfahren zum Identifizieren abrupter Energieanderungen in einem akustischen Signal, welche von einem tieri- 
schen Ohr wie z. B. einem menschlichen Ohr wahrnehmbar sind, als Darstellung eines unterscheidbaren Tonbil- 
des, wobei das Verfahren umfaBt: 

20 Ableiten eines Transienten-Signals aus^dem akustischen Signal, mit Transienten-lmpulsen, welche abrupte 

Energieanderungen in dem akustischen Signal mit einer Anstiegszeit von hochstens 2 ms darstellen, 
Selektieren eines dominanten Impulses dieser Transienten-lmpulse in dem Signal, und 
Vergleichen der Form des dominanten Impulses mit vorbestimmten Transienten-Signalimpulsen, welche un- 
terscheidbare Tonbilder darstellen. 

2S 

5. Verfahren nach Anspruch 4, 

bei welchem die Form eines Trarlsienten-lmpulses durch eine Hullkurven-Erfassung einer Transienten-Antwort 
der Energieanderung in dem akustischen Signal erhalten wird. 

50 6. Verfahren zum Verarbeiten eines akustischen Signals zum Verringern der Bandbreite des Signals mit wesentlicher 
Beibehaltung der Information des Signals, mit Extrahieren der Transienten-Komponente entsprechend einer ab- 
rupten Energieanderung des akustischen Signals, gefolgt vom Erfassen einer Hullkurve der Transienten-Kompo- 
nente, wobei die Hutlkurven-Ertassung in solch einer Weise ausgefuhrt wird, daG sie von der extrahierten Tran- 
sienten-Komponente ein Transienten-Signal mit Transienten-lmpulsen mit einer Form erhalt, welche abrupte En- 

35 ergieahderungen mit einer Anstiegszeit von hochstens 2 ms darstellen. 

7. Verfahren nach Anspruch 6, 

bei welchem die Transienten-lmpulsformen des Signals, welches von einem tierischen Ohr wie z. B. einem mensch- 
lichen Ohr wahrnehmbar ist, als ein unterscheidbares Tonbild darstellend erkannt werden. 

40 

8. Verfahren nach Anspruch 7, 

bei welchem das unterscheidbare Tonbild ein Phonem ist. 

9. Verfahren nach Anspruch 4 oder 7, 

45 bei welchem die Form der Vorderflanke eines Impulses erkannt wird. 

10. Verfahren nach Anspruch 9, 

bei welchem die Form der Vorderflanke bestimmt ist durch Bestimmen der Anstiegszeit, der Steigung und/oder 
der Steigungs-Veranderung wenigstens eines Teiles der Vorderflanke. 

so 

II. Verfahren nach Anspruch 10, 

bei welchem die Anstiegszeit, Steigung und/oder Steigungs-Veranderung von wenigstens dem oberen Teil der 
Vorderflanke bestimmt wird. 

55 12. Verfahren nach Anspruch 11 , 

bei welchem der obere Teil der Teil ist, der im wesentlichen an einem Punkt beginnt, an welchem die Steigung 
maximal ist. 
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13. Verfahren nach Anspruch 10, 

bei welchem die Anstiegszeit, Steigung und/oder Steigungsveranderung der vorderen Flanke auf der Basis von 
wenigstens 5 Abtastungen bestimmt wird. 

5 14. Verfahren nach einem der Anspruche 9 bis 13, 

bei welchem die Identifizierung der Form der vorderen Flanke unter Verwendung eines Vergleichs mit einer Re- 
ferenzbibliothek ausgefuhrt wird. 

15. Verfahren nach Anspruch 14, 

10 bei welchem die Referenzen, mit welchen der Vergleich ausgefuhrt wird. auf der Basis der Anstiegszeit der Vor- 

derflanke selektiert werden. ' 

16. Verfahren nach Anspruch 4 Oder 7, 

bei welchem die Dauer„ejnes Impulses erkannt wird. 

15 

17. Verfahren nach Anspruch 16, 

bei welchem die Dauer eines Impulses als die Distanz von der Vorderflanke zu der Hinterflanke bei einer vorbe- 
stimmten Amplitude bestimmt wird. 

20 18. Verfahren nach Anspruch 17," 

bei welchem die vorbestimmte Amplitude eine Amplitude von hochstens 50% der maximalen Amplitude des Im- 
pulses ist. 

19. Verfahren nach einem der AnsprOche 9 bis 18, 

25 bei weichem Impulse, welche von dem tierischen Ohr nicht wahrnehmbar sind, von der Erkennung ausgeschlossen 

werden. 

20. Verfahren nach Anspruch 1 9, 

bei welchem ein Impuls, dessen Vorderflanke eine Amplitude von weniger als 50% der Amplitude des vorherge- 
30 henden Impulses aufweist und eine Dauer von weniger als 3,5 ms aufweist, unbeachtet bleibt. 

21. Verfahren nach einem der Anspruche 9 bis 20, 

bei welchem die Form der Hinterflanke eines Impulses erkannt wird. 

35 22. Verfahren nach Anspruch 21, 

bei welchem die Form der Hinterflanke bestimmt wird durch Bestimmen der Abfailzeit, Steigung und/oder Stei- 
gungs-Veranderung wenigstens eines Teiles der Hinterflanke. 

23. Verfahren nach einem der AnsprOche 9 bis 21 , 

40 bei welchem der Zeitabschnittzwischen Vorderflanken von Impulsen, welche von dem tierischen Ohr wahrnehmbar 

sind, bestimmt wird. 

24. Verfahren nach Anspruch 23, 

bei welchem der Zeitabschnitt zwischen Vorderflanken, welche eine Distanz von wenigstens 3 ms voneinander 
45 aufweisen, bestimmt wird. 

25. Verfahren zum Telekommunizieren eines akustischen Signals, mit Verarbeiten des Signals durch das Verfahren 
gema3 einem der Anspruche 6 bis 24, Senden des verarbeiteten Signals und Empfangen des verarbeiteten Signals 
durch einen Empfanger 

50 

26. Verfahren nach Anspruch 25. 

bei welchem vor der Ubertragung des verarbeiteten Signals das Signal in einer digitalen Darstellung kodiert und 
das kodierte Signal in dem Empfanger dekodiert wird, urn die Transienten-lmpulsformen wiederherzustellen, wel- 
che von dem tierischen Ohr wie z. B. dem menschlichen Ohr wahrnehmbar sind, als Darstellung der unterscheid- 
^5 baren Tonbilder des akustischen Signals. 

27. Verfahren nach Anspruch 26, 

bei welchem die digitale Ubertragung mit einer Bandbreite von hochstens 4000 Bits pro Sekunde ausgefuhrt wird. 
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28. Verfahren nach Anspruch 27, 

bej welchem die Bandbreite hochstens 2000 Bits pro Sekunde betragt. 

29. Verfahren nach Ansprucin 28, 

bei welchem die Bandbreite in dem Intervall von 800-2000 Bits pro Sekunde liegt. 

30. Verfahren nach einem der Anspruche 26 bis 29, 

bei welchem die digitale Information Information Ober die Vorderflanke, die Hinterflanke und die Dauer des Tran- 
sienten-lmpulses umfaBt. 

31. Verfahren nach einem der Anspruche 26 bis 30, 

bei welchem ein zweiter und weitere Impulse in einer Folge identischer Impulse durch ein eine Wiederholung 
angebendes digitales Zeichen dargestellt werden. 

32. Verfahren nach einem der Anspruche 6 bis 24^ ' '^ 

bei welchem die Extraktion der Transienten-Komponente eine BandpaR-Filterung oder eine HochpaG-Filterung 
umfaBt. 

33. Verfahren nach einem der Anspruche 6 bis 24 oder 32, 

bei welchem die Hullkurven-Erfassung eine Gleichrichtung und eine TiefpaO-Filterung umfaf3t. 

34. Verfahren nach Anspruch 32, 

bei welchem die untere Grenzf requenz der BandpaB- oder HochpaB-Filterung wenigstens 2 kHz, z. B. etwa 3 kHz 
betragt. 

35. Verfahren nach Anspruch 32 oder 34, 

bei welchem die obere Grenzfrequenz im Bereich zwischen 4,5 und 7 kHz, bevorzugt bei etwa 6 kHz liegt. 

36. Verfahren nach Anspruch 33, 

bej welchem die Gleichrichtung eine Einweg-Gleichrichtung ist. 

37. Verfahren nach Anspruch 33 oder 36, 

bei welchem die Grenzfrequenz der TiefpaQ-Filterung im Bereich von 400-1 000 Hz, bevorzugt bei etwa 700 Hz liegt. 

38. Verfahren nach einem der Anspruche 6 bis 24 oder 32, 

bei welchem die Hullkurven-Erfassung eine Bandpa3-Filterung unter Verwendung einer Filterbank umfa3t. 

39. Verfahren zum Identifizieren oder Darstellen des Phonems "i" wie in "heat", mit Identlfizieren oder Erzeugen eines 
Transienten-lmpulses mit einer Anstiegszeit der Vorderflanke von hochstens 0,5 ms und einer Dauer von hoch- 
stens 1,1 ms. 

40. Verfahren nach Anspruch 39, 

bei welchem die Anstiegszeit der Vorderflanke hochstens 0,4 ms betragt, bevorzugt hochstens 0,3 ms. 

41. Verfahren nach Anspruch 39 oder 40, 

bei welchem die Dauer hochstens 1 ,0 ms, bevorzugt etwa 0,8 ms betragt. 

42. Verfahren zum Identifizieren oder Darstellen des Phonems "o", wie In "hop", mit Identifizieren oder Erzeugen eines 
Transienten-lmpulses mit einer Anstiegszeit der Vorderflanke von hochstens 0,5 ms und einer Dauer von 1 ,3-1 ,8 
ms. 

43. Verfahren nach Anspruch 42, 

bei welchem die Anstiegszeit der Vorderflanke hochstens 0,4 ms, bevorzugt hochstens 0,3 ms betragt. 

44. Verfahren nach Anspruch 39 oder 40, 

bei welchem die Abfallzeit der Hinterflanke hochstens 0,5 ms, bevorzugt hochstens 0,4 ms und noch weiter be- 
vorzugt hochstens 0,3 ms betragt. 
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45. Verfahren zum Identifizieren oder Darstellen des Phonems "o", wie in dem englischen Wort "ongaonga" oder in 
dem danischen Wort "Ole", mit Identifizieren oder Erzeugen eines Transienten-lmpulses mit einer Anstiegszeit 
der Vorderflanke von hochstens 0,5 ms und einer Dauer von 1,3-1,8 ms. 

5 46. Verfahren zunn Identifizieren oder Darstellen des Phonenns "u", wie in dem englischen Wort "who^ nnit Identifizieren 
Oder Erzeugen eines Transienten-lmpulses mit einer Sinuskurven-lnterpolation und einer Dauer von 1 ,0-2,0 ms, 
bevorzugt etwa 1 ,5 ms. 

47. Verlahren nach einem der Anspruche 1 bis 24 oder 39 bis 46, 
10 welches bei einer Spracherkennung verwendet wird. 

48. Verfahren nach etnem der Anspruche 1 bis 5 oder 39 bis 46, 
welches bei einer Sprach-Komprimierung verwendet wird. 

15 49. Verfahren nach einem der Anspruche 1 bis 5 oder 39 bis 46. 

welches zum Synthetisieren menschlicher Sprache venwendet wird, mit der Erzeugung einer Reihe von Transien- 
ten-lmpulsen entsprechend der Reihe von Phonemen, welche die zu synthetisierende Sprache bilden. 

50. Verfahren nach Anspruch 49, , 

20 bei welchem die Reihe von Phonemen a'us einer Reihe von Buchstaben unter Verwendung regelbasierter Um- 

wandlung aufgebaut wird. 

51. Verfahren nach einem der Anspruche 1 bis 5 oder 39 bis 46, 

welches bei der Qualitatsmessung von Audioprodukten verwendet wird, wobei die Audioprodukte bevorzugt Laut- 
25 sprecher sind, Horhilfen oder Telekommunikationssysteme. 

52. Verfahren nach einem der Anspruche 1 bis 5 oder 39 bis 46, 

welches bei der Qualitatsmessung akustischer Bedingungen in einem Raum oder im Freien venwendet wird. 

30 53. System zum Verarbeiten eines akustischen Signals zum Verringern der Bandbreite des Signals mit der wesentli- 
chen Beibehaltung der Information des Signals, mit einer Einrichtung zum Extrahieren der Transienten-Kompo- 
nente entsprechend einer abrupten Energieanderung des akustischen Signals, und einer Einrichtung zum Erfas- 
sen einer HQIIkurve der Transienten-Komponente, wobei die HDIIkurven-Erfassungseinrichtung so ausgebildet ist, 
da3 sie aus der extrahierten Transienten-Komponente ein Transienten -Signal ableitet, mitTransienten-lmpulsen 

35 mit einer Form, welche abrupte Energieanderungen mit einer Anstiegszeit von hochstens 2 ms darstellt. 

54. System nach Anspruch 53, 

mit einer Einrichtung zum Identifizieren oder Darstellen der Energieanderungen auf der Basis der Form der Tran- 
sienten-lmpulse. 

40 

55. System nach Anspruch 53 oder 54, 

bei welchem die Einrichtung zur Transienten-Komponenten-Extraktion ein Bandpa3-Filter oder ein HochpaR-Filter 
umfaBt. 

45 56. System nach einem der Anspruche 53 bis 55, 

bei welchem die Hullkurven-Erfassungseinrichtung einen Gleichrichter und ein TiefpaB-Filter umfaBt. 

57. System nach Anspruch 55 oder 56, 

bei welchem die unlere Grenzfrequenz des Bandpa3- oder HochpaG-F liters wenigstens 2 kHz, z. B. etwa 3 kHz 
50 betragt. 

58. System nach einem der Anspruche 55 bis 57, 

bei welchem die obere Grenzfrequenz des BandpaB-Fiiters im Bereich zwischen 4,5 und 7 kHz und bevorzugt bei 
etwa 6 kHz liegt. 

55 

59. System nach einem der Anspruche 56 bis 58, 

bei welchem der Gleichrichter ein Einweg-Gleichrichter ist. 
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60. System nach einem der Anspruche 56 bis 59, 

bei welchem die Grenzfrequenz des TiefpaR-Filters im Bereich von 400-1000 Hz, bevorzugt bei etwa 700 Hz, liegt. 

61. System nach Anspruch 53 oder 54, 

bei welchem die Hullkurven-Erfassungseinrichtung eine Filterbank umfaBt. 



Revendications 

1 . Utilisation de la forme de variations d'^nergie brutales d'un signal audio pour Identifier ou representor des carac- 
teristiques pouvant etre pergues par una oreille d'animal telle qu'une oreille humaine, comme representant une 
image sonore distincte, lesdites variations d'6nergie brutales du signal audio 6tant representees par une impulsion 
transitoire d'un signal transitoire derivee du signal auditif , ayant un temps de montee d'au plus 2 ms et ladite forme 
des variations d'energie brutales etant representee par la forme de I'impulsion transitoire. 

2. Utilisation salon la fevendication 1 , dans laquelle la forme d'une impulsion transitoire est obtenue en utilisant une 
detection d'enveloppe, 

3. Utilisation salon Tune quelconque des revendications pr^cedentes, dans laquelle I'image sonore distincte est un 
phoneme. ■ 

4. Procede pour identifier, dans un signal audio, des variations d'energie brutales pouvant etre pergues par une oreille 
d'animal telle qu'une oreille humaine, comme representant une image sonore distincte, le precede comprenant 

♦ la derivation, d'apres le signal audio, d'un signal transitoire comprenant des impulsions transitoires represen- 
tant des variations d'energie brutales du signal audio ayant un temps de montee d'au plus 2 ms, 

♦ la selection d'une impulsion dominante de ces impulsions transitoires dans le signal, et 

♦ la comparaison de la forme de I'impulsion dominante a des impulsions de signaux transitoires predeterminees 
representant des images sonores distinctes. 

5. Precede selon la revendication 4, dans lequel la forme d'une impulsion transitoire est obtenue par une detection 
d'enveloppe d'une reponse transitoire de la variation d'energie dans le signal audio. 

6. Proc6d6 de traitement d'un signal audio pour diminuer la largeur de bande du signal, avec une retention substan- 
tielle des infornaations du signal, comprenant I'extraction de la composante transitoire correspondent a une varia- 
tion d'energie brutale du signal audio suivie par la detection d'une enveloppe de la composante transitoire, ladite 
detection d'enveloppe etant effectuee d'une maniere telle ^ obtenir, d'aprds la composante transitoire extraite, un 
signal transitoire comprenant des impulsions transitoires avec une forme representant des variations d'energie 
brutales ayant un temps de montee d'au plus 2 ms. 

7. Precede selon la revendication 6, dans lequel les formes d'impulsion transitoire du signal pouvant etre pergues 
par une oreille d'animal telle qu'une oreille humaine comme representant une image sonore distincte, sont iden- 
tifiees. 

8. Precede selon la revendication 7, dans lequel I'image sonore distincte est un phoneme. 

9. Precede selon la revendication 4 ou 7, dans lequel la forme du front de montee d'une impulsion est identifiee. 

1 0. Precede selon la revendication 9, dans lequel la forme du front de montee est determinee en determinant le temps 
de montee, la pente et/ou la variation de pente d'au moins une partie du front de montee. 

1 1 . Precede selon la revendication 1 0, dans lequel le temps de montee, la pente et/ou la variation de pente d'au moins 
la partie superieure du front de montee sont determines. 

12. Precede selon la revendication 11 , dans lequel la partie superieure est la partie commengant sensiblement en un 
point ou la pente est maximale. 

13. Precede selon la revendication 10, dans lequel le temps de montee, la pente et/ou la variation de pente du front 
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de montee sont determines sur la base d'au moins 5 echantillons. 

14. Precede salon I'une quelconque des revendications 9 a 13, dans lequel Tidentificatlon de la forme du front de 
mont6e est effectu6e en utillsant la connparalson avec une bibllothdque de references. 

5 

15. Precede selon la revendication 14, dans lequel les references avec lesquelles la comparaison est effectuee sont 
choisies sur la base du temps de mont6e du front de mont6e. 

16. Precede selon la revendication 4 ou 7, dans lequel la duree d'une impulsion est identifiee. 

10 

17. Precede selon la revendication 16, dans lequel la duree d'une impulsion est determinee comnne etant la distance 
entre le front de montee et le front retarde, a une amplitude pr6d6terminee. 

-„ ■* r • 

18. Precede selon la revendication 17, dans lequel i'amplitude predeterminee est une amplitude d'au plus 50% de 
^5 I'amplitude maximale de I'impulsion. 

1 9. Precede selon I'une quelconque des revendications 9 a 1 8, dans lequel des impulsions ne pouvant pas etre pergues 
par I'oreille animale sont eliminees de I'identification. 

20 ' 20. Precede selon la revendication 1 9, dans lequeTunelmpulsion dent le front de montee a une annplitude d'au moins ' 
50% de i'amplitude de impulsion precedente et un temps d'attaque de moins de 3,5 ms est ignoree. 

21. Proc6de selon I'une quelconque des revendications 9 a 20, dans lequel la forme du front retarde d'une impulsion 
est identifiee. 

25 

22. Precede selon la revendication 21, dans lequel la forme du front retarde est determinee en determinant le temps 
de descente, la pente et/ou la variation de pente d'au moins une partie du front de montee. 

23. Precede selon I'une quelconque des revendications 9 a 21 , dans lequel la duree entre fronts de montee d'impul- 
30 sions pouvant etre pergue par roreille animale est determinee. 

24. Precede selon la revendication 23, dans lequel la duree entre fronts de montee ayant une distance d'au moins 3 
ms entre eux est determinee 

35 25. Precede de telecommunication d'un signal audio, comprenant le traitement du signal par le precede selon I'une 
quelconque des revendications 6 a 24, la transmission du signal traite et la reception du signal traite par un re- 
cepteur 

26. Precede seion la revendication 25, dans lequel, avant transmission du signal traite, le signal est code selon une 
"to representation numehque et le signal cede est decode dans le r^cepteur de fa^on a retablir des formes d'impulsions 

transltoires pergues par I'oreille animale telle que I'oreille humaine, comme representant les innages sonores dis- 
tlnctes du signal audio. 

27. Precede selon la revendication 26, dans lequel la transmission numerique est effectuee avec une largeur de bande 
45 d'au plus 4000 bits par seconde. 

28. Precede selon la revendication 27, dans lequel la largeur de bande est d'au plus 2000 bits par seconde. 

29. Precede selon la revendication 28, dans lequel la largeur de bande se trouve situee dans I'intervalle allant de 800 
^0 a 2000 bits par seconde. 

30. Precede selon I'une quelconque des revendications 26 a 29, dans lequel les informations numeriques comprennent 
des informations concernant le front de montee, le front retard^ et la duree de Timpulsion transitoire. 

^5 31. Precede selon I'une quelconque des revendications 26 a 30, dans lequel la deuxieme impulsion et les impulsions 
suivantes d'une sequence d'impulsions identiques sont representees par un signe numerique indiquant la repeti- 
tion. 
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32. Procedd selon Tune quelconque des revendications 6 a 24, dans lequel I'extraction des composantes transitoires 
comprend un filtrage passe bande ou un fiitrage passe haut. 

33. Proc6d6 selon I'une quelconque des revendications 6 ^ 24 ou 32, dans lequel la detection d'enveloppe comprend 
5 un redressement et un filtrage passe bas. 

34. Proc6d6 selon la revendicatlon 32, dans lequel la frequence de coupure Inf^rieure du filtrage passe bande ou 
passe haut est d'au moins 2 kHz, par exemple d'environ 3 kHz. 

10 35. Procede selon la revendication 32 ou 34, dans lequel la frequence de coupure sup6rieure se trouve situ6e dans 
la plage comprise entre 4,5 et 7 kHz, de preference d'environ 6 kHz. 

36. Procede selon la revendicatlon 33, dans lequel le redressement est un redressement unidirectionnel. 

15 37. Proc6d6 selon les revendications 33 ^ 36, dans lequel la frequence de coupure du filtrage passe bas se trouve 
situee dans la plage allant de 400 a 1000 Hertz, de preference d'environ 700 Hz. 

38. Proc6d6 selon Tune quelconque des revendications 6 ^ 24 ou 32, dans lequel la detection d'enveloppe comprend 
un filtrage passe bande utilisant un banc de filtres. 

20 

39. Procede d'identification ou de representation du phoneme "i" tel que dans "heat", comprenant 1' identification ou la 
generation d'une impulsion transitoire avec un temps de montee du front de mont6e d'au plus 0,5 ms et une duree 
d'au plus 1,1 ms. 

25 40. Proc6de selon la revendication 39, dans lequel le temps de montee du front de montee est d'au plus 0,4 ms, de 
pr6f6rence d'au plus 0,3 ms. 

41 . Procede selon la revendicatlon 39 ou 40, dans lequel la duree est d'au plus 1 ,0 ms, de preference d'environ 0,8 ms. 

30 42. Procede d'identification ou de representation du phoneme "o" comme dans "hop", comprenant I'identification ou 
la generation d'une impulsion transitoire avec un temps de montee du front de montee d'au plus 0,5 ms et une 
duree de 1 ,3 a 1,8 ms. 



43. Proc6d6 selon la revendication 42, dans lequel le temps de mont6e du front de mont6e est d'au plus 0,4 ms, de 
35 preference d'au plus 0,3 ms. 

44. Proc6d6 selon la revendication 39 ou 40, dans lequel le temps de descente du front retard^ est d'au plus 0,5 ms, 
de preference d'au plus 0,4 ms et de fag^on davantage preferee d'au plus 0,3 ms. 

40 45. Proc6d6 d'identification ou de representation du phoneme "o", comme dans le mot anglais "ongaonga" ou le mot 
danois "Ole", comprenant I'identification ou la generation d'une impulsion transitoire avec un temps de montee du 
front de mont6e d'au plus 0,5 ms et une dur6e de 1 ,3 ^ 1 ,8 ms. 

46. Procede d'identification ou de representation du phoneme "u" comme dans le mot anglais "who", comprenant 
^5 ['identification ou la g6n6ration d'une impulsion transitoire avec une interpolation de courbe en sinus et une dur6e 

de 1 ,0 a 2.0 ms, de preference d'environ 1 ,5 ms. 

47. Proc6d6 selon I'une quelconque des revendications 1 ^ 24 ou 39 ^ 46, lorsqu'il est utilis6 pour la reconnaissance 
de la parole. 

50 

48. Proc6d6 selon I'une quelconque des revendications 1 ^ 5 ou 39 ^ 46, utilise pour la compression de la parole. 

49. Proc6d6 selon I'uhe quelconque des revendications 1 ^ 5 ou 39 ^ 46, lorsqu'il est utilis6 pour la synthase de la 
parole humaine, comprenant la generation d'une serie d'impulsions transitoires correspondent a la serie de pho- 

55 nemes constituant la parole a synthetiser 

50. Procede selon la revendication 49, dans lequel la serie de phonemes est etablie a partir d'une serie de lettres en 
utilisant une conversion a base de regie. 
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51. Precede selon Tune quelconque des revendications 1 a 5 ou 39 a 46, utilise pour la mesure de qualite de produits 
audio, las produits audio etant de preference des haut-parleurs, des assistances audltives ou des systemes de 
telecommunication. 

5 52. Precede selon I'une quelconque des revendications 1 a 5 ou 39 a 46, utilise pour la mesure de qualite de conditions 
acoustiques dans une piece ou a I'exterieur. 

53. Systeme pour traiter un signal audio pour diminuer la largeur de bande du signal avec une retention substantielle 
des informations du signal, comprenant des moyens pour extraire la composante transitoire correspondant a une 
10 variation d'energie brutale du signal audio et des moyens pour detecter une enveloppe de la composante transi- 

toire, lesdtts moyens de detection d'envetoppe etant prevus pour deriver, a partir de la composante transitoire 
extraite, un signal transitoire comprenant des impulsions transitoires avec une forme representant des variations 
d'energie brutales ayant un temps de montee d'au plus 2 rri.S- 

15 54. Systeme selon la revendication 53, comprenant en outre des moyens pour identifier ou representer les variations 
d'energie sur la base de la forme des impulsions transitoires. 

55. Systeme selon les revendications 53 ou 54, dans lequel les moyens d'extraction de composantes transitoires 
comprennent un filtre passe bande ou un filtre passe hauU^^,, 
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56. Systeme selon I'une quelconque des revendications 53 a 55, dans lequel les moyens de detection d'enveloppe 
comprennent un redresseur et un filtre passe bas. 

57. Systeme selon la revendication 55 ou 56, dans lequel la frequence de coupure inferieure du filtre passe bande ou 
passe haut est d'au moins 2 kHz, par exemple d'environ 3 kHz. 

58. Systeme selon I'une quelconque des revendications 55 a 57, dans lequel la frequence de coupure superieure du 
filtre passe bande est situee dans la plage comprise entre 4,5 et 7 kHz, de preference d'environ 6 kHz. 

59. Systeme selon I'une quelconque des revendications 56 a 58, dans lequel le redresseur est un redresseur unidi- 
rectionneL 

60. Systeme selon I'une quelconque des revendications 56 a 59, dans lequel la frequence de coupure du filtre passe 
bas est situee dans la plage allant de 400 a 1000 Hz, de preference d'environ 700 Hz. 

61. Systeme selon la revendication 53 ou 54, dans lequel les moyens de detection d'enveloppe comprennent un banc 
de filtres. 
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Fig. 12 
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Speech signal. 




Fig. 14a 



Pretransient signal 




Fig. 14b 



Transient signal. 




Fig. 14c 
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Fig. 15b 
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Fig. 15c 
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Fig. 16b 
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Fig. 18 
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Percept. Normalized. 
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Percept Nonnalized. 




60 



